-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added chat & completions api #10
base: master
Are you sure you want to change the base?
Conversation
"stream": stream, | ||
} | ||
|
||
base_url = "https://vllm-tt-dev-49305ac9.workload.tenstorrent.com/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should create an ingress route that would be standard for EdenAI vllm deployment. And the workload that I tested is currently in vllm-tt-dev in tenstorrent team, but I'll move it to EdenAI team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to add some taints on the LLMBoxes in k8 that we specifically need to reserve for Eden maybe?
"max_tokens": max_tokens, | ||
} | ||
|
||
base_url = "https://vllm-tt-dev-49305ac9.workload.tenstorrent.com/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should create an ingress route that would be standard for EdenAI vllm deployment. And the workload that I tested is currently in vllm-tt-dev in tenstorrent team, but I'll move it to EdenAI team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress. Here's some thoughts.
"stream": stream, | ||
} | ||
|
||
base_url = "https://vllm-tt-dev-49305ac9.workload.tenstorrent.com/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
} | ||
|
||
base_url = "https://vllm-tt-dev-49305ac9.workload.tenstorrent.com/v1" | ||
client = OpenAI(base_url=base_url,api_key=self.api_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you are using the openai client. I'm not sure the client should be redefined on every call like this.
Remember that when the client is defined, that initializes a session. If we want sticky sessions to work. The same client needs to be re-used by the same user.
Note that in the example edenai-api by openai you shared earlier also uses the client, and the client is saved to the class as a member of self. I'm actually confused where the client is initilized:
Can you figure out where the client is initialized, and we should probably do similar?
Also, just noting for comparison that the openai implementation of text__generation does not use the client and just uses the requests module, but I think we should use the client:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, great point! You are right that if we want to leverage the sticky sessions/kv-cache reuse, then we probably shouldnt instantiate client again and again.
After looking a bit on how OpenAI did it, the client for the text__chat method in the OpenaiTextApi class is initialized in the OpenaiApi class constructor here:
https://github.com/edenai/edenai-apis/blob/aa32409084469b81be98989964b0ab82d7326ccc/edenai_apis/apis/openai/openai_api.py#L55
This happens during the instantiation of the OpenaiApi object in openai_api.py. The OpenaiApi class inherits from OpenaiTextApi, so the self.client object will be accessible to text__chat.
I'll try to modify along similar lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other note you mentioned about openai using request for text__generation and not the OpenAI client, our tenstorrent code is already using the OpenAI client for both text__chat and text__generation.
@rreece One thing I am confused about is whether the name of this file be set to |
Similar to the other examples from other providers, the name of the file saved to the repo should be |
One thing I am confused with is whether the name of this file
Cool, sounds like I have been doing it the right way then. Thanks for confirming. |
No description provided.